Yet Another Song Format

Chris Dragan

Modules are not professional formats for storing music. The main disadvantage of modules is that they confine the composer to rows. Even if one sets Speed to 1 (i.e. row is incremented every BPM tick), tracking still more conforms to programming rather than making music. Inserting effects is like programming in machine code, but people use assemblers instead of machine code.

Sceners use modules as a music storage, and not MIDI, since every device plays MIDI differently. MIDI is a musical standard, but MIDI songs require recording in studios to get the final stage.

Why not make composing easier and give the composers an ultimate music standard that would have the advantages of both MIDI (freedom for notes) and modules (samples included)?

I have decided to present my own song format. I hope it will introduce some "fresh blood" to the music scene. The format is very simple, hence powerful and versatile.

Song data

Song does not consist of rows and patterns anymore, and this is revolutionary. Song data is a stream (sequence) of events. Each event is described by a doubleword (4 bytes). An event can be a note, an effect or something else (e.g. user defined event).

Events are grouped into packets. All events in a packet correspond to a fixed moment in time. This means that all notes and effects that in a typical module are in the same row, now are in the same packet.

Each packet begins with the following 4-byte structure:

   struct _PACKET {
      word NumEvents; // number of events in a packet
      word DeltaTime; // number of time units between two packets
   }

A packet can contain upto 65535 events. DeltaTime specifies amount of time units between the current and the previous packet. If DeltaTime is 0, the current packet takes place one time unit after the previous packet.

Time units are calculated according to current tempo. Tempo is the same as BPM in modules. The frequency of time units is calculated as 2*Tempo/5. As tempo varies between 1 and 256, time unit frequency ranges between 0.4Hz (period T=2.5s) and 102.4Hz (T=9.7ms).

Notes range between 0 and 95 (C-0..B-7). Note 96 is an off-note - it ends the sound.

Each event, unless it is a global event, corresponds to one of 256 channels. The composer does not have to specify on which channel his instrument will be played; channel assigning is done by the composing tool (I hesitate to name it a tracker).

Instruments

An instrument consist of a sample dispatch table, default vibrato parameters and envelopes.

The composer uses the sample dispatch table to assign samples from a pool of upto 65536 samples to each note. Samples can be shared between instruments. This enables the composer to create several instruments with the same samples, but sounding differently (i.e. with different envelopes).

Each sample in a song consists of sample data and some additional parameters, like loop control and volume scale.

Inside a module file, envelopes, like samples, also reside in a separate group and are not included in instruments. Instruments have indexes to the table with envelopes. And there are two kinds of envelopes: 1D envelopes can be used both as volume and position envelopes and 3D envelopes can be used only as position envelopes.

Position envelopes extend term "panning". Specifying a 1D envelope for position causes using simple panning on an instrument. But specifying a 3D envelope makes the sound really 3D (unless the player does not support 3D sound - it then converts 3D envelopes to simple 1D panning envelopes).

There is also an idea of synthetic instruments. These would have wave generators instead of sample dispatch tables, and include modulation envelopes.

Song file format

A song is stored in a kind of IFF format. Refer to TAD's article in Hugi#17 for advantages of this format.

The file begins with eight-byte identifier, ASCII characters that make a string 'FORMSONG'. It is followed by data chunks. Each chunk begins with the following header:


   struct _CHUNK {
      char ID[4];      // 4 ASCII characters - chunk type
      dword ChunkSize; // Size of chunk without the header
   }

There are six basic, pre-defined types of chunks:
- 'DESC' - song description structure,
- 'INFO' - composer's text,
- 'INST' - instrument,
- 'ENVL' - envelope,
- 'SAMP' - sample,
- 'STRM' - stream, i.e. song data.

All chunks in the file should have 8-byte aligned size - most processors access aligned data quicker.

Chunks 'DESC', 'INFO' and 'STRM' exist only once per song. There can be more than one song in a file. Each subsequent 'DESC', 'INFO' or 'STRM' chunk belongs to a subsequent song (e.g. fifth 'STRM' chunk belongs to the fifth song in the file). The 'DESC' structure has field that specifies number of instruments, so 'INST' chunks are not shared between songs. There can be upto 65536 envelope definitions and upto 65536 samples in a file. Both 'ENVL' and 'SAMP' chunks can be shared between songs.

An example chunk layout of a file:


     -------------------------------------------------------------
     |SAMP|SAMP|ENVL|DESC|DESC|STRM|INFO|INFO|STRM|INST|INST|INST|
     -------------------------------------------------------------

There are 2 samples and 1 envelope in this file, all shared between the two modules in the file. The chunk types may be in any order, so e.g. the first 'STRM' chunk belongs to the first song, and the second one to the second song. If the first song has two instruments and the second song one instrument, the first two instruments in the file belong to the first song and the third instrument to the other song.

A lot of mess, huh? Typically one song is stored in a file, and there is no problem with that. Many composers often use the same samples in their tunes. Hence putting many songs in one file enables sharing envelopes and samples between them, and thus reduces overall size. What's more, such multi-song files after compressing them are much smaller. (I did such experiment with my own modules - I managed to compress 13 modules in 2.5MB down to 550KB).

The IFF format enables the composer and the composition utility to place other data with a song. As song data is simply a sequence of events, it is not organized in any bigger structures, like patterns. The composer may divide a song into larger bits and the composition utility may store the information about that in its own chunks. Finally a song may be compressed with some compression algorithm, and information about this can also be stored in some kind of chunk.

There are no sample or instrument names, like it used to be in typical modules. Information about from which files samples and instruments come may be stored by the composition utility in its own chunks. Composers used to include some text with their modules in place of instrument names. For this purpose there is the 'INFO' chunk. Composers can now include also images with their tunes.

Chunk descriptions

A 'DEST' chunk:


struct _DESC {
   word Version;	     // Song format version
   word CompatibleVersion;   // Last version with which it is compatible
   byte NumInstruments;      // Number of instruments used in the song
   byte _alignment[3];	     // ...just missing alignment
   dword RestartPosition;    // Song restart position (in time units)
   char Title[32];	     // Song title
   char Composer[32];	     // Who composed the song
   char Tracker[16];	     // "Tracker" name
}

A 'SAMP' chunk, followed by sample data. The sample data is stored as signed delta values like in XM modules (better compression).


struct _SAMP {
   byte Volume; 	      // Volume (0..255)
   byte Note;		      // Relative note (-96..95, 0 => C-4 = C-4)
   byte Finetune;	      // Finetune (-128..127, 128==1 halftone)
   byte Flags;		      // Loop type, data type
   dword LoopStart;	      // Loop start (index to sample, not byte)
   dword LoopLength;	      // Loop length in samples, not bytes
   dword Length;	      // Sample length in samples, not bytes
   byte Data[1];	      // Sample data...
}

An 'ENVL' chunk:


struct _ENVL {
   byte Type;			// 0 - 1D envelope, 1 - 3D envelope
   byte NumPoints;		// Number of points in the envelope
   byte Sustain;		// Position of sustain (point index)
   byte LoopStart;		// Loop start point index
   byte LoopEnd;		// Loop end point index
   byte _alignment;
   word Fadeout;		// Analogous to XM
   _P Points[1];		// Points of the envelope
}

For a 1D envelope point structure is:


   struct _1DPOINT {
      byte Value;
      byte _alignment;
      word Position;
   }

And for 3D envelope:


   struct _3DPOINT {
      sdword X, Y, Z;		// 3D position - signed dwords
      byte _alignment[2];
      word Position;
   }

And 'INST' chunk:


struct _INST {
   byte Type;			// 0 for instrument with samples
   byte VibratoWaveform;	// Related to default vibrato...
   byte VibratoSpeed;
   byte VibratoAmplitude;
   word VibratoSweep;
   word VolumeEnvelope; 	// Index to envelope which is used for
				// volume
   word PositionEnvelope;	// Index to position envelope
   word SampleDispatch[96];	// Samples for each note
}

Events

As mentioned before, 'STRM' chunk (song data) consists of a stream of events. Events with the same position in time are grouped into packets (description of packet structure is above). Each event has the following structure:


   struct _EVENT {
      byte Command;	// Identifies the event
      byte Channel;	// On which channel the event occured
			// (unless global)
      byte A;		// First parameter of the event
      byte B;		// Second parameter
   }

Currently the following events are defined ( EventName (ParamA, ParamB) ):

Note (Note, Instrument)
Starts playing an instrument. If the note equals to 96, this is a NoteOff event. Volume and position are initialized to sample defaults.

Volume (Volume, Speed)
Slides the volume towards the given volume by Speed/4 values per time unit. If Speed=255 the volume is set immediately.

GlobalVolume (Volume, Speed)
Similar to the above, but with global scope.

Portamento (Amount, ANote)
Slides pitch of the sound towards the given note. MSB of note is the 9th bit of amount. Pitch is slided by Amount/16 notes per time unit.

Vibrato (Amplitude, Speed)
Sets vibrato parameters to the given values. If either parameter equals to 255, it is initialized with its default value from instrument.

Tremolo (Amplitude, Speed)
Similar to vibrato, but does not affect pitch, but volume.

GlobalTremolo (Amplitude, Speed)
Guess what...

WaveformControl (What, Waveform)
Sets waveform of vibrato, tremolo or global tremolo.

Tempo (Tempo, 0)
Sets tempo.

VolumeEnvelope (EnvLo, EnvHi)
Sets new volume envelope.

PositionEnvelope (EnvLo, EnvHi)
Sets new position envelope.

VolEnvPosition (PosLo, PosHi)
Sets position of volume envelope.

PosEnvPosition (PosLo, PosHi)
Sets position of position envelope.

EndStream (0, 0)
Indicates an end of the stream. It should be specified at the end of the stream, or a player may think the data is corrupt.

A composition utility may define other events, like for example 'PatternBreak' or other, to make it easy for the composer to navigate through his tune. Also the user may want to define some events, which can be used for synchronization, when the song is used in a demo.

There can be also effects like reverb, chorus, etc. The format is really flexible. A song in this format can potentially be big (four bytes per each effect and note), but it is very simple to handle and can be very well compressed.

Let's sum it up

This song format has some obvious advantages:
- freedom of placing notes,
- 256 channels,
- many possibilities for a "tracker" interface,
- both modules and MIDIs can be imported,
- versatility and expandability,
- up to 65536 samples can be shared between 256 instruments,
- the same for envelopes,
- placing sounds in 3D space,
- and more...

Implementation

There are still some problems around the above format. For example how the events should be grouped: by time/occurence (like above) or by channel.

Yet better should be grouping the events in "tracks" and assigning the tracks instruments. There would be events like "note on" and "note off" and channels would be assigned automatically by the player, so there could be multiple notes in a track played and virtually no limitation on the number of channels. This would work exactly like in MIDI. Still, MIDI is a hardware standard, that e.g. tells how a note is started (slow/fast), but does not tell what the volume should be - and this should be defined by the format, like in modules.

For me, the most important would be that a small player (possibly entirely written in assembly) would be freely distributed with the composing utility as an OBJ file, making it possible to take advantage of the format by the people.

The biggest problem is the composition utility: how should it look and work and what features should it have? Possibly, editing song by tracks or, better, by instruments should be the basis (unlike in modules). Instrument, sample and envelope editors should have some advanced features, including sample generator. For today, very important would be saving the songs in MP3 format (i.e. mixing them to WAV and then possibly compressing), and a support for "final-touch" effects (like echo, reverb and frequency filters) should be also available.

What do YOU think?

I am waiting for your response, no matter what is your skill, especially if you would like to join a project of a composition utility for this format.

If you are a musician, let me know what you think about all of this. Would it be really another sound revolution or just another tracker? And maybe you have ideas of improving this format?

Chris Dragan